Different approaches for identifying important concepts in probabilistic biomedical text summarization
نویسندگان
چکیده
Automatic text summarization tools help users in the biomedical domain to acquire their intended information from various textual resources more efficiently. Some of biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the raw frequency for identifying valuable contents within an input document, or considering correlations existing between concepts, may be more useful for this type of summarization. In this paper, we describe a Bayesian summarization method for biomedical text documents. The Bayesian summarizer initially maps the input text to the Unified Medical Language System (UMLS) concepts; then it selects the important ones to be used as classification features. We introduce six different feature selection approaches to identify the most important concepts of the text and select the most informative contents according to the distribution of these concepts. We show that with the use of an appropriate feature selection approach, the Bayesian summarizer can improve the performance of biomedical summarization. Using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit, we perform extensive evaluations on a corpus of scientific papers in the biomedical domain. The results show that when the Bayesian summarizer utilizes the feature selection methods that do not use the raw frequency, it can outperform the biomedical summarizers that rely on the frequency of concepts, domain-independent and baseline methods.
منابع مشابه
Semantic Annotation and Summarization of Biomedical Literature
Semantic Annotation and Summarization of Biomedical Literature Lawrence Harold Reeve, Jr. Hyoil Han, Ph.D. Advancements in the biomedical community are largely documented and published in text format in scientific forums such as conferences and journals. To address the scalability of utilizing the large volume of text-based information generated by continuing advances in the biomedical field, t...
متن کاملBioChain: Using Lexical Chaining Methods for Biomedical Text Summarization
1 ABSTRACT Lexical chaining is a technique for identifying semantically-related terms in a text. It is useful in document summarization in order to identify the top sentences most likely to contain the main ideas of a document or document set. These top sentences are then extracted and combined in order to produce a summary of the document(s). To date, summarization work using lexical chains ha...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملText Summarization in Data Mining
Text summarizers automatically construct summaries of a naturallanguage document. This paper examines the use of text summarization within data mining, identifying the potential summarizers have for uncovering interesting and unexpected information. It describes the current state of the art in commercial summarization and current approaches to the evaluation of summarizers. The paper then propo...
متن کاملResolving ambiguity in biomedical text to improve summarization
Access to the vast body of research literature that is now available on biomedicine and related fields can be improved with automatic summarization. This paper describes a summarization system for the biomedical domain that represents documents as graphs formed from concepts and relations in the UMLS Metathesaurus. This system has to deal with the ambiguities that occur in biomedical documents....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Artificial intelligence in medicine
دوره 84 شماره
صفحات -
تاریخ انتشار 2018